A Unified and Comprehensible View of Parametric and Kernel Methods for Genomic Prediction with Application to Rice

نویسندگان

  • Laval Jacquin
  • Tuong-Vi Cao
  • Nourollah Ahmadi
چکیده

One objective of this study was to provide readers with a clear and unified understanding of parametric statistical and kernel methods, used for genomic prediction, and to compare some of these in the context of rice breeding for quantitative traits. Furthermore, another objective was to provide a simple and user-friendly R package, named KRMM, which allows users to perform RKHS regression with several kernels. After introducing the concept of regularized empirical risk minimization, the connections between well-known parametric and kernel methods such as Ridge regression [i.e., genomic best linear unbiased predictor (GBLUP)] and reproducing kernel Hilbert space (RKHS) regression were reviewed. Ridge regression was then reformulated so as to show and emphasize the advantage of the kernel "trick" concept, exploited by kernel methods in the context of epistatic genetic architectures, over parametric frameworks used by conventional methods. Some parametric and kernel methods; least absolute shrinkage and selection operator (LASSO), GBLUP, support vector machine regression (SVR) and RKHS regression were thereupon compared for their genomic predictive ability in the context of rice breeding using three real data sets. Among the compared methods, RKHS regression and SVR were often the most accurate methods for prediction followed by GBLUP and LASSO. An R function which allows users to perform RR-BLUP of marker effects, GBLUP and RKHS regression, with a Gaussian, Laplacian, polynomial or ANOVA kernel, in a reasonable computation time has been developed. Moreover, a modified version of this function, which allows users to tune kernels for RKHS regression, has also been developed and parallelized for HPC Linux clusters. The corresponding KRMM package and all scripts have been made publicly available.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predictive Ability of Statistical Genomic Prediction Methods When Underlying Genetic Architecture of Trait Is Purely Additive

A simulation study was conducted to address the issue of how purely additive (simple) genetic architecture might impact on the efficacy of parametric and non-parametric genomic prediction methods. For this purpose, we simulated a trait with narrow sense heritability h2= 0.3, with only additive genetic effects for 300 loci in order to compare the predictive ability of 14 more practically used ge...

متن کامل

Functional-Coefficient Autoregressive Model and its Application for Prediction of the Iranian Heavy Crude Oil Price

Time series and their methods of analysis are important subjects in statistics. Most of time series have a linear behavior and can be modelled by linear ARIMA models. However, some of realized time series have a nonlinear behavior and for modelling them one needs nonlinear models. For this, many good parametric nonlinear models such as bilinear model, exponential autoregressive model, threshold...

متن کامل

صحت انتخاب ژنومی روش‌های پارامتری و ناپارامتری با معماری‌های ژنتیکی افزایشی و غالبیت

     In most genomic prediction studies only additive effects will be used in models for estimating genomic breeding values (GEBV). However, dominance genetic effects are an important source of variation for complex traits, considering them into account may improve the accuracy of GEBV. In the present  study,  performed applying  simulated data, the effect of  different heritability values (0.1...

متن کامل

تنظیم و کاربرد الگوریتم جنگل تصادفی در ارزیابی ژنومی

One of the most important issues in genomic selection is using a decent method for estimating marker effects and genomic evaluation. Recently, machine learning algorithms which are members of non-parametric and non-linear methods have been extended to genomic evaluation. One of these methods is Random Forest (RF) on which this research was focused. Important parameters in RF algorithm are the n...

متن کامل

تشخیص سرطان پستان با استفاده از برآورد ناپارمتری چگالی احتمال مبتنی بر روش‌‌های هسته‌ای

Introduction: Breast cancer is the most common cancer in women. An accurate and reliable system for early diagnosis of benign or malignant tumors seems necessary. We can design new methods using the results of FNA and data mining and machine learning techniques for early diagnosis of breast cancer which able to detection of breast cancer with high accuracy. Materials and Methods: In this study,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2016